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Abstract. In the scope of discrete finite-state models of interacting 
components, we present a novel algorithm for identifying sets of local 
states of components whose activity is necessary for the reachability of 
a given local state. If all the local states from such a set are disabled in 
the model, the concerned reachability is impossible. 
Those sets are referred to as cut sets and are computed from a particular 
abstract causality structure, so-called Graph of Local Causality, inspired 
from previous work and generalised here to finite automata networks. 
The extracted sets of local states form an under-approximation of the 
complete minimal cut sets of the dynamics: there may exist smaller or 
additional cut sets for the given reachability. 

Applied to qualitative models of biological systems, such cut sets provide 
potential therapeutic targets that are proven to prevent molecules of 
interest to become active, up to the correctness of the model. Our new 
method makes tractable the formal analysis of very large scale networks, 
as illustrated by the computation of cut sets within a Boolean model of 
biological pathways interactions gathering more than 9000 components. 



1 Introduction 

With the aim of understanding and, ultimately, controlling physical systems, 
one generally constructs dynamical models of the known interactions between 
the components of the system. Because parts of those physical processes are 
ignored or still unknown, dynamics of such models have to be interpreted as an 
over-approximation of the real system dynamics: any (observed) behaviour of 
the real system has to have a matching behaviour in the abstract model, the 
converse being potentially false. In such a setting, a valuable contribution of 
formal methods on abstract models of physical systems resides in the ability to 
prove the impossibility of particular behaviours. 

Given a discrete finite-state model of interacting components, such as an au- 
tomata network, we address here the computation of sets of local states of com- 
ponents that are necessary for reaching a local state of interest from a partially 
determined initial global state. Those sets are referred to as cut sets. Informally, 
each trace leading to the reachability of interest has to involve, at one point, 
at least one local state of a cut set. Hence, disabling in the model all the local 



states referenced in one cut set should prevent the occurrence of the concerned 
reachabihty from dehmited initial states. 

Applied to a model of a biological system where the reachability of inter- 
est is known to occur, such cut sets provide potential coupled therapeutic tar- 
gets to control the activity of a particular molecule (for instance using gene 
knock-in/out). The contrary implies that the abstract model is not an over- 
approximation of the concrete system. 

Contribution. In this paper, we present a new algorithm to extract sets of local 
states that are necessary to achieve the concerned reachability within a finite 
automata network. Those sets are referred to as cut sets, and we limit ourselves 
to A'^-sets, i.e. having a maximum cardinality of N. 

The finite automata networks we are considering are closely related to 1- 
safe Petri nets [T] having mutually exclusive places. They subsume Boolean and 
discrete networks |2l3l4l5j . synchronous or asynchronous, that are widely used 
for the qualitative modelling of biological interaction networks. 

A naive, but complete, algorithm could enumerate all potential candidate 
A^-sets, disable each of them in the model, and then perform model-checking 
to verify if the targeted reachability is still verified. If not, the candidate A^- 
set is a cut set. This would roughly leads to tests, where m is the total 
number of local states in the automata network. Considering that the model- 
checking within automata networks is PSPACE-complete [6J , this makes such an 
approach intractable on large networks. 

The proposed algorithm aims at being tractable on systems composed of a 
very large number of interacting components, but each of them having a small 
number of local states. Our method principally overcomes two challenges: pre- 
vent a complete enumeration of candidate A-sets; and prevent the use of model- 
checking to verify if disabling a set of local states break the concerned reacha- 
bility. It inherently handles partially-determined initial states: the resulting cut 
A^-set of local states are proven to be necessary for the reachability of the local 
state of interest from any of the supplied global initial states. 

The computation of the cut A^-sets takes advantage of an abstraction of the 
formal model which highlights some steps that are necessary to occur prior to 
the verification of a given reachability properties. This results in a causality 
structure called a Graph of Local Causality (GLC), which is inspired by [7], and 
that we generalise here to automata networks. Such a GLC has a size polynomial 
with the total number of local states in the automata network, and exponen- 
tial with the number of local states within one automata. Given a GLC, our 
algorithm propagates and combines the cut A^-sets of the local states referenced 
in this graph by computing unions or products, depending on the disjunctive 
or conjunctive relations between the necessary conditions for their reachability. 
The algorithm is proven to converge in the presence of dependence cycles. 

In order to demonstrate the scalability of our approach, we perform the 
search for cut A"-sets of processes within a very large Boolean model of biological 
processes relating more than 9000 components. Despite the highly combinatorial 
dynamics, a prototype implementation manages to compute up to the cut 5-sets 



within a few minutes. To our knowledge, this is the first time such a formal 
dynamical analysis has been performed on such a large dynamical model of 
biological system. 

Related work and limitations. Cut sets are commonly defined upon graphs as 
set of edges or vertices which, if removed, disconnect a given pair of nodes fS]. 
For our purpose, this approach could be directly applied to the global transition 
graph (union of all traces) to identify local states or transitions for which the 
remove would disconnect initial states from the targeted states. However, the 
combinatorial explosion of the state space would make it intractable for large 
interacting systems. 

The aim of the presented method is somehow similar to the generation of 
minimal cut sets in fault trees ^9.10, used for reliability analysis, as the structure 
representing reachability causality contains both and and or connectors. How- 
ever, the major difference is that we are here dealing with cyclic directed graphs 
which prevents the above mentioned methods to be straightforwardly applied. 

Klamt et al. have developed a complete method for identifying minimal cut 
sets (also called intervention sets) dedicated to biochemical reactions networks, 
hence involving cycling dependencies . This method has been later generalised 
to Boolean models of signalling networks [12] . Those algorithms are mainly based 
on the enumeration of possible candidates, with techniques to reduce the search 
space, for instance by exploiting symmetry of dynamics. Whereas intervention 
sets of |11|12| can contain either local states or reactions, our cut sets are only 
composed of local states. 

Our method follows a different approach than [11112) by not relying on can- 
didate enumeration but computing the cut sets directly on an abstract structure 
derived statically from the model, which should make tractable the analysis of 
very large networks. The comparison with [T2] is detailed in Subsect. 14.11 

In addition, our method is generic to any automata network, but relies on an 
abstract interpretation of dynamics which leads to under-approximating the cut 
sets for reachability: by ignoring certain dynamical constraints, the analysis can 
miss several cut sets and output cut sets that are not minimal for the concrete 
model. Finally, although we focus on finding the cut sets for the reachability of 
only one local state, our algorithm computes the cut sets for the (independent) 
reachability of all local states referenced in the GLC. 

Outline. Sect. [2] introduces a generic characterisation of the Graph of Local 
Causality with respect to automata networks; Sect. [3] states and sketches the 
proof of the algorithm for extracting a subset of Ai"-sets of local states necessary 
for the reachability of a given local state. Sect. [1] discusses the application to sys- 
tems biology by comparing with the related work and applying our new method 
to a very large scale model of biological interactions. Finally, Sect. O discusses 
the results presented and some of their possible extensions. 

Notations. A and V are the usual logical and and or connectors. [l;n] = 
{1, • ■ • ,n}. Given a finite set A, #A is the cardinality of A; p{A) is the power 



set of A; p-^{A) is the set of all subsets of A with cardinality at most N. Given 
sets A^,...,A^, UiG[in] union of those sets, with the empty union 

IJ0 0; and A^ x ■ ■ ■ x A" is the usual Cartesian product. Given sets of sets 

B\...,B'' e p{p{A)), ri,;G[i;n]-S' G p{p{A)) is the sets of sets 

product where {ei, . . . , e„}x{e'j, . . . , e^} = {e.^ U e^- | i £ [1; A j G [1; m]}. In 
particular V(i,j) G x [l;m], B'xB-'' = B^xB' and 0xB* 0. The empty 

sets of sets product Y[$ = {0}- If M : A n> _B is a mapping from elements in A 
to elements in B, M{a) is the value in B mapped to a G A; M{a t-^ b} is the 
mapping M where a G A now maps to b G B. 

2 Graph of Local Causality 

We first give basic definitions of automata networks, local state disabling, context 
and local state reachability; then we define the local causality of an objective 
(local reachability), and the Graph of Local Causality. A simple example is given 
at the end of the section. 

2.1 Finite Automata Networks 

We consider a network of automata (S, S, C, T) which relates a finite number of 
interacting finite state automata S (Def. [1]) . The global state of the system is 
the gathering of the local state of composing automata. A transition can occur 
if and only if all the local states sharing a common transition label i £ L are 
present in the global state s G S' of the system. Such networks characterize a 
class of 1-safe Petri Nets [IJ having groups of mutually exclusive places, acting as 
the automata. They allow the modelling of Boolean networks and their discrete 
generalisation, having either synchronous or asynchronous transitions. 

Definition 1 (Automata Network {U, S, C, T)). An automata network is de- 
fined by a tuple (S, S, £, T) wtiere 

— S = {a, 6, . . . , z} is the finite set of automata identifiers; 

— For any a G S, S{a) = [1; ka] is the finite set of local states of automaton a; 
S — riaei^t-'^' ^^j] finite set of global states. 

— C ~ {£1, . . . ,im} is the finite set of transition labels; 

— T = {a Tq I a G S}, where Va G S^Ta C [l;fca] x Cx [l;^^], is the 
mapping from automata to their finite set of local transitions. 

£ A i /I i 

We note i — > j G T{a) 4^ {i,l,j) G Ta and ai ^ aj G T ^ i ^ j E T{a). 

VI G C, we note 'i = {oi \ Oi aj G T{a)} and £' = {aj \ ai — > aj G T(a)}. 

The set of local states is defined as LS = {oi | a G 17 A i G [1; ka]}. 
The global transition relation — >-C S x S is defined as: 

s s' 4> 3^ G £ :Vaj G '£, s{a) = Oj A Voj G £',s'{a) = aj 

AV6 G S, S{b) n V = ^ s{b) = s'{b). 



Given an automata network Sys = {S, S, C, T) and a subset of its local states 
Is C LS, Sys Is refers to the system where all the local states Is have been 
disabled, i.e. they can not be involved in any transition (Def. 

Definition 2 (Process Disabling). Given Sys — {S, S, C,T) and Is S p(LS), 

Sys els^ {S, S, T') where C ^ {£ e C \ Is D'i ^ 9} and T' ^ {a^ 4 aj G 
T I £e £'}. 

From a set of acceptable initial states delimited by a context (, (Def. [3]) , we say 
a given local state aj € LS is reachable if and only if there exists a finite number 
of transitions in Sys leading to a global state where aj is present (Def. 0]). 

Definition 3 (Context Given a network {S, S, C,T), a context q is a map- 
ping from each automaton a E S to a non-empty subset of its local states: 
Vae i:,^(a) G p(5(a)) A^(a) 7^0. 

Definition 4 (Process reachability). Given a network {S,S,C,T) and a 
context the local state aj G LS is reachable from if and only if3so, . . . , Sm G 
5* such that Va G S, So{a) G <r(a), and sq ^ ■ ■ ■ ^ s^, and Sm(a) = j ■ 

2.2 Local Causality 

Locally reasoning within one automaton a, the global reachability of Oj from ? 
can be expressed as the reachability of aj from a local state ai G ^(a). This local 
reachability specification is referred to as an objective noted Oi — ?>* Oj (Def. [5]) 

Definition 5 (Objective). Given a network (S, S, C,T), the reachability of 
local state Oj from a; is called an objective and is denoted ai Oj . The set of 

all objectives is referred to as Obj ^ {ai^*aj \ {ai,aj) G LS x LS}. 

Given an objective P — Oi-^* aj G Obj, we define sol(P) the local causality 
of P (Def. |6|): each element Is G sol(P) is a subset of local states, referred to 
as a (local) solution for P, which are all involved at some times prior to the 
reachability of Oj. sol(P) is sound as soon as the disabling of at least one local 
state in each solution makes the reachability of Oj impossible from any global 
state containing (Property [T]) . Note that if sol(P) — {{ai} U /s^, . . . , Zs"'} is 
sound, sol'(P) = {/s^, . . . , Zs™} is also sound. sol(ai — s>* a^) = implies that a^ 

can never be reached from a^, and Va^ G LS, sol(ai — a^) == {0}. 

Definition 6. sol : Obj H> p(p(LS)) is a mapping from objectives to sets of 
sets of local states such that VP G Obj, V^s G sol(P), '^Is' G sol(P), Is ^ Is' such 

that Is' C Is. The set of these mappings is noted Sol = {(P, Is) \ Is G sol(P)}. 

Property 1 (sol soundness) . sol(ai aj) = {Is^, . . . , Zs"} is a sound set of solu- 
tions for the network Sys = [S , S, L,T) if and only if Vfc/s G Hieli nj^'^*' '^^ 
not reachable in Sys kls from any state s G S" such that s{a) — i. 



In the rest of this paper we assume that Property [T] is verified, and consider 
sol computation out of the scope of this paper. 

Nevertheless, we briefly describe a construction of a sound so\{ai —i'* aj) for 
an automata network (S, S, C, T); an example is given at the end of this section. 
This construction generalises the computation of GLC from the Process Hitting 
framework, a restriction of network of automata depicted in |7]. For each acyclic 

sequence a.; — ^ . . . aj of local transitions in T(a), and by defining exta(^) = 

{bj eLS \bj ^ bk eT,b ^ a}, we set Is G Y[e<E{ei,....e^\ext^{e)=^<li}^^'^a{i) ^ e 
sol(ai Uj), up to sursets removing. One can easily show that Property[T]is ver- 
ified with such a construction. The complexity of this construction is exponential 
in the number of local states of automata and polynomial in the number of au- 
tomata. Alternative constructions may also provide sound (and not necessarily 
equal) sol. 

2.3 Graph of Local Causality 

Given a local state Uj G LS and an initial context the reachability of 
is equivalent to the realization of any objective a^— >*aj, with G <?(a). By 
definition, if aj is reachable from <j, there exists Is € sol(ai — ?>* a^) such that, 
Vfofc £ Is, bk is reachable from 

The (directed) Graph of Local Causality (GLC, Def. [7|) relates this recursive 
reasoning from a given set of local states w C LS by linking every local state aj to 
all objectives aj, ai S <;(a); every objective P to its solutions {P, Is) G Sol; 

every solution {P, Is) to its local states b^ £ Is. A GLC is said to be valid if sol 
is sound for all referenced objectives (Property [2]) . 

Definition 7 (Graph of Local Causality). Given a context <r and a set of 

local states u C LS, the Graph of Local Causahty (GLC) = (F^", E^), with 
C LS U Obj U Sol and C x , is the smallest structure satisfying: 

flj e T/"^ n LS {(flj, a-i a^) \ a-j £ ?} C £'^" 
ai^*aj ^V^ r\Ohi {{a^^* aj,{ai^* a.j,ls)) \ {ai^*aj,ls) £ Sol} C 
(P, Is) e V^" n Sol ^ {{{P, Is), a,) I a, £ Is} C E^ . 

Property 2 (Valid Graph of Local Causality). A GLC is valid if, VP G n 
Obj, sol(P) is sound. 

This structure can be constructed starting from local states in lo and by 
iteratively adding the imposed children. It is worth noticing that this graph can 
contain cycles. In the worst case, = #LS + #Obj + #Sol and #£'^ = 

#Obj + #Sol + E(p.z,)esoi#^«- 



bi &3 



> Ci C2 




Fig. 1. Example of Graph of Local Causality that is valid for the automata network 
defined in Example [1] 



Example 1. Fig. [T] shows an example of GLC. Local states are represented by 
boxed nodes and elements of Sol by small circles. 

For instance, such a GLC is valid for the followiirg automata network [S, S, C, T), 
with initial context = {a M> {!}; b {!}; c H- {1, 2}; d H- {2}}: 



s = 


{a, b, c, d} 


£ 


= {4,4,^3,4,4,4} 


5(a) = 


[i;3] 


T{a) 


= {1^2;2^3;1^ 


5(6)- 


[i;3] 


m 


= {1^2; 1^3;!^ 


5(c) ^ 


[i;2] 


T{c) 


= {1 ^ 2;2 ^ 1} 


5(d) = 


[i;2] 


T{d) 


= {1 2;2 1} 



£4, 



2} 



For example, within automata a, there are two acyclic sequences from 1 to 
3: 1 2 ^ 3 and 1 ^ 3. Hence, if 03 is reached from ai, then necessarily, one 
of these two sequences has to be used (but not necessarily consecutively). For 
each of these transitions, the transition label is shared by exactly one local state 
in another automaton: &i, C2, 63 for €2, ^3, ^1, respectively. Therefore, if 03 is 
reached from ai, then necessarily either both 61 and C2, or 63 have been reached 
before. Hence sol(ai a^) = {{&i, C2}, {^3}} is sound, as disabling either bi and 
63, or C2 and 63, would remove any possibility to reach 03 from ai. 



3 Necessary Processes for Reachability 



We assume a global sound GLC A'^ — {V^^,E^), with the usual accessors for 
the direct relations of nodes: 

children : 1/'^ ^ p(V^^") parents : 1/" ^ p(V^,") 

children(n) ^ {m e \ {n,m) e E^} parents(n) = {m e t/" | (m, n) G E^} 

Given a set of local states Obs C LS, this section introduces an algorithm 
computing upon the set Y{ai) of minimal cut iV-sets of local states in Obs 
that are necessary for the independent reachability of each local state Ui G 
LS n . The minimality criterion actually states that V/s G V(ai), there is no 
different Is' G V(ai) such that Is' C Is. 

Assuming a first valuation V (Def. associating to each node a set of (min- 
imal) cut A^-sets, the set of cut A^-sets for the node n can be refined following 
update(V,n) (Def.©: 

— if n is a solution (P, Is) G Sol, it is sufficient to prevent the reachability of 
any local state in Is to cut n; therefore, the cut A^-sets results from the union 
of the cut A^-sets of n children (all local states). 

— If n is an objective P G Obj, all its solutions (in sol(P)) have to be cut in 
order to ensure that P is not realizable: hence, the cut A^-sets result from 
the product of children cut A-sets (all solutions) . 

— If n is a local state a^, it is sufficient to cut all its children (all objectives) to 
prevent the reachability of from any state in the context <;. In addition, if 
a-i G Obs, {ai} is added to the set of its cut A^-sets. 

Definition 8 (Valuation V). A valuation V : V;" p{p'^^ (Obs)) IS a map- 
ping from each node of A^ to a set of N -sets of local states. Val is the set of all 
valuations. Yq G Val refers to the valuation such that yn G ,Yo{n) = 0. 

Definition 9 (update : Val x T^" t-^ Val). 



update(V, n) = < 



V{n ^ C^(y™echi,dren(„) V(m))} tfuG Sol 

nn ^ C^(n^echi,dren(„)V(m))} zf n € Obj 

nn ^ C^(n™echi,dren(n)V(m))} « G LS \ Obs 

W{n ^ C{{{a,}} U n™echiidren(„)V(m))} n G LS n Obs 



where C^({ei, . . . ,e„}) = {e, | i G [l;n] A #6^ < N A$j E [l;n],j ^ i,ej C ej, 
Ci being sets, Vi G 

Starting with Vq, one can repeatedly apply update on each node of A'^ to re- 
fine its valuation. Only nodes where one of their children value has been modified 
should be considered for updating. 

Hence, the order of nodes updates should follow the topological order of the 
GLC, where children have a lower rank than their parents (i.e., children are 



Algorithm 1 ^"-Minimal-Cut-NSets 



1 


M ^ 


2 


Vo 


3 


while >1 7^ do 


4 


n <— argmmmgjV({rank(m)} 


5 


M^M\{n} 


6 


V' update(V,n) 


7 


if V'(n) / V(ri) then 


8 


yVf ^ U parents(n) 


9 


end if 


10 


V 


11 


end while 


12 


return V 



treated before their parents). If the graph is actually acyclic, then it is sufficient 
to update the value of each node only once. In the general case. i.e. in the 
presence of Strongly Connected Components (SCCs) — nodes belonging to the 
same SCC have the same rank — , the nodes within a SCC have to be iteratively 
updated until the convergence of their valuation. 

Algorithm [1] formalizes this procedure where rank(r7,) refers to the topological 
rank of n, as it can be derived from Tarjan's strongly connected components 
algorithm [T3J, for example. The node n G to be updated is selected as being 
the one having the least rank amongst the nodes to update (delimited by A^). 
In the case where several nodes with the same lowest rank are in A^, they can be 
either arbitrarily or randomly picked. Once picked, the value of n is updated. If 
the new valuation of n is different from the previous, the parents of n are added 
to the list of nodes to update (lines 6-8 in Algorithm [T]) . 

Lemma [T] states the convergence of Algorithm [1] and Theorem [1] its correct- 
ness: for each local state G fl LS, each set of local states kls G V(ai) 
(except {fli} singleton) references the local states that are all necessary to reach 
prior to the reachability of from any state in i;. Hence, if all the local states 
in kls are disabled in Sys, ai is not reachable from any state in 

Lemma 1. ^"-Minimal-Cut-NSets always terminates. 

Proof. Remarking that plp-'^ {Obs}) is finite, defining a partial ordering such 

that yv,v' € p{p^'^{Obs)),v >r w ' 4> C{v) = ('^ {v U v'), and noting V'= G Val 
the valuation after k iterations of the algorithm, it is sufficient to prove that 
V'=+i(n) y V'=(n). Let us define vi,V2,v[,v'2 G p{p^'^{Obs)) such that vi h v[ 
and V2 We can easily check that wi U t;2 t'i U V2 (hence proving the case 

when n G Sol). As C^(wi) = C^{vi U v{) Ve^ G v{,3ei € vi : ei C e[, we 
obtain that V(e'i,e2) G x ^2,3(61,62) £ vi x V2 : ei C e'l A 62 C Cj. Hence 
ei Ue2 C e'iUe'2, therefore {vi xv2Uv[xv2) = C'^ {vi XU2), i.e. vi xv2 h v'iXv'2; 
which proves the cases when n G Obj U LS. 



Node 
(6i^*6i,0) 
6i->*bi 
bi 

di — 5«* d2 
d2 

(6i->*63,{d2}) 
bz 

{ai^* as,{bs}) 

C2 ->* C2 
C2 

(ai ^* as, {61, C2}) 
ffli as 
az 

(ci^*C2, {as}) 



rank 
~T" 

2 
3 



5 
6 
7 
8 
9 
10 
11 
12 
13 
13 
13 
13 
13 



{{bi}} 

{{fei}} 

{{&i}} 

{{&i},{d2}} 

{{&i},{d2}} 

{{fei},{d2}} 

{{6l},{&3},{rf2}} 
{{fel},{63},{rf2}} 



{{C2}} 
{{&1},{C2}} 
{{&l},{&3,C2},{c2,d2}} 
{{as}, {&l}, {&3, C2}, {C2, rf2}} 

{{"3}, {bi}, {63, C2}, {e2, ^2}} 



Table 1. Result of the execution of Algorithm [T] on the GLC in Fig. [T] 



Theorem 1. Given a GLC = {V^ ,E'^) which is sound for the automata 
network Sys, the valuation V computed by ^^-Minimal-Cut-NSets verifies: 
Voi G LSnV^", VfcLs e V(ai)\{{ai}}, Oj is not reachable from <^ within SysQkls. 

Proof. By recurrence on the valuations V: the above property is true at each 
iteration of the algorithm. 

Example 2. Table[T]details the result of the execution of Algorithm[T]on the GLC 
defined in Fig. [1] Nodes receive a topological rank, identical ranks implying the 
belonging to the same SCC. The (arbitrary) scheduling of the updates of nodes 
within a SCC follows the order in the table. In this particular case, nodes are 
all visited once, as ¥((c2 C2, 0)) xV((ci -J-* C2, {03})) = (hence update(V, C2) 
does not change the valuation of C2). Note that in general, several iterations of 
update may be required to reach a fixed point. 

It is worth noticing that the GLC abstracts several dynamical constraints in 
the underlying automata networks, such as the ordering of transitions, or the 
synchronous updates of the global state. In that sense, GLC over-approximates 
the dynamics of the network, and the resulting cut sets are under-approximating 
the complete cut sets of the concrete model. 



4 Application to Systems Biology 

Automata networks, as presented in Def. [TJ subsume Boolean and discrete net- 
works, synchronous and asynchronous, that are widely used for the qualitative 
modelling of dynamics of biological networks [2|3|14|4|5|15|16| . 



A cut set, as extracted by our algorithm, informs that at least one of the 
component in the cut set has to be present in the specified local state in order 
to achieve the wanted reachability. A local state can represent, for instance, 
an active transcription factor or the absence of a certain protein. It provides 
potential therapeutic targets if the studied reachability is involved in a disease 
by preventing all the local states of a cut set to act, for instance using gene 
knock-out or knock-in techniques. 

We first discuss and compare our methodology with the intervention sets 
analysis within biological models developed by S. Klamt et ai, and provide 
some benchmarks on a few examples. 

Thanks to the use of the intermediate GLC and to the absence of candidate 
enumeration, our new method makes tractable the cut sets analysis on very 
large models. We present a recent application of our results to the analysis of a 
very large scale Boolean model of biological pathway interactions involving 9000 
components. To our knowledge, this is the first attempt of a formal dynamical 
analysis on such a large scale model. 

4.1 Related Work 

The general related work having been discussed in Sect. [1] we deepen here the 
comparison of our method with the closest related work: the analysis of Inter- 
vention Sets (IS) In the scope of Boolean models of signalling networks, an 
IS is a set of local states such that forcing the components to stick at these local 
states ensures that the system always reaches a fixed point (steady state) where 
certain target components have the desired state. Their method is complete, 
i.e., all possible ISs are computed; and contrary to our, allow the specification 
of more than one local state for the target state of the intervention. 

Nevertheless, the semantics and the computation of ISs have some key dif- 
ferences with our computed cut sets. First, they focus only on the reachability 
of (logical) steady states, which is a stronger condition than the transient reach- 
ability that we are considering. Then, the steady states are computed using 
a three-valued logic which allows to cope with undefined (initial) local states, 
but which is different from the notion of context that we use in this paper for 
specifying the initial condition. 

Such differences make difficult a proper comparison of inferred cut sets. We 
can however expect that any cut sets found by our method has a corresponding 
IS in the scope of Boolean networks when the initial context actually specifies a 
single initial state. 

To give a practical insight on the relation between the two methods, we 
compare the results for two signalling networks, both between a model specified 
with CellNetAnalyser [17] to compute ISs and a model specified in the Process 
Hitting framework, a particular restriction of asynchronous automata networks 
|18| . to compute our cut sets. Process Hitting models have been built in order 
to over-approximate the dynamics considered for the computation of IStH. 



^ Models and scripts available at |http: //loicpauleve .name/cutsets . tbz2| 



Tcell. Applied to a model of the T-cell receptor signalling between 40 compo- 
nents [12], we are interested in preventing the activation of the transcription 
factor API. For an instance of initial conditions, and limiting the computations 
to 3-sets, 31 ISs have been identified (28 1-sets, 3 2-sets, 3-set), whereas our 
algorithm found 29 cut sets (21 1-sets and 8 2-sets), which are all matching an 
IS (23 are identical, 6 strictly including ISs). ISs are computed in 0.69s while 
our algorithm under-approximates the cut sets in O.Q06s. Different initial states 
give comparable results. 

Egfr. Applied to a model of the epidermal growth factor receptor signalling 
pathway of 104 components [TS], we are interested in preventing the activation 
of the transcription factor API. For an instance of initial conditions, and limiting 
the computations to 3-sets, 25 ISs have been identified (19 1-sets, 3 2-sets, 3 3- 
sets), whereas our algorithm found 14 cut sets (14 1-sets), which are all included 
in the ISs. ISs are computed in 98s while our algorithm under-approximates the 
cut sets in 0.004s. Different initial states give comparable results. 

As expected with the different semantics of models and cut sets, resulting ISs 
matches all the cut sets identified by our algorithm, and provides substantially 
more sets. The execution time is much higher for ISs as they rely on candi- 
date enumeration in order to provide complete results, whereas our method was 
designed to prevent such an enumeration but under-approximates the cut sets. 

In order to appreciate the under-approximation done by our method at a 
same level of abstraction and with identical semantics, we compare the cut sets 
identified by our algorithm with the cut sets obtained using a naive, but com- 
plete, computation. The naive computation enumerates all cut set candidates 
and, for each of them, disable the local states in the model and perform model- 
checking to verify if the target local state is still reachable. In the particular case 
of these two models, and limiting the cut sets to 3 and 2-sets respectively for the 
sake of tractability, no additional cut set has been uncovered by the complete 
method. Such a good under-approximation could be partially explained by the 
restrictions imposed on the causality by the Process Hitting framework, making 
the GLC a tight over-approximation of the dynamics [7]. 

4.2 Very Large Scale Application to Pathway Interactions 

In order to support the scalability of our approach, we apply the proposed algo- 
rithm to a very large model of biological interactions, actually extracted from the 
PID database |20j referencing various influences (complex formation, inductions 
(activations) and inhibitions, transcriptional regulation, etc.) between more than 
9000 biological components (proteins, genes, ions, etc.). 

Amongst the numerous biological components, the activation of some of them 
are known to control key mechanisms of the cell dynamics. Those activations are 
the consequence of intertwining signalling pathways and depend on the environ- 
ment of the cell (represented by the presence of certain entry-point molecules). 



Model 


N 


Visited nodes 


Exec, time 


Nb. of resulting N-sets 


SNAILi 


pl5INK4bi 


p21CIPli 


whole PID OR 


1 


29022 


0.9s 


1 


1 


1 


2 


36602 


1.6s 


+6 


+6 


+0 


3 


44174 


5.4s 


+0 


+92 


+0 


4 


54322 


39s 


+30 


+60 


+0 


5 


68214 


8.3m 


+90 


+80 


+0 


6 


90902 


2.6h 


+930 


+208 


+0 



Table 2. Number of nodes visited and execution time of the search for cut A'^-sets of 
3 local states. For each N, only the number of additional N-sets is displayed. 



Uncovering the environmental and intermediate components playing a major 
role in these signalling dynamics is of great biological interest. 

The full PID database has been interpreted into the Process Hitting frame- 
work, a subclass of asynchronous automata networks, from which the derivation 
of the GLC has been addressed in previous work [7J . The obtained model gathers 
components representing either biological entities modelled as boolean value (ab- 
sent or present), or logical complexes. When a biological component has several 
competing regulators, the precise cooperations are not detailed in the database, 
so we use of two different interpretations: all (resp. one of) the activators and 
none (resp. all but one of) the inhibitors have to be present in order to make 
the target component present. This leads to two different discrete models of PID 
that we refer to as whole_PID_AND and whole_PID_OR, respectively. 

Focusing on whole_PID_OR, the Process Hitting model relates more than 
21000 components, either biological or logical, containing between 2 and 4 local 
states. Such a system could actually generate 2'^'^*^^ states. 3136 components act 
as environment specification, which in our boolean interpretation leads to 2"^^'^^ 
possible initial states, assuming all other components start in the absent state. 

We focus on the (independent) reachability of active SNAIL transcription 
factor, involved in the epithelial to mesenchymal transition [21j . and of active 
pl5INK4b and p21CIPl cyclin-dependent kinase inhibitors involved in cell cycle 
regulation [22]. The GLC relates 20045 nodes, including 5671 component local 
states (biological or logical); it contains 6 SCCs with at least 2 nodes, the largest 
being composed of 10238 nodes and the others between 20 and 150. 

Table [2] shows the results of a prototype implementatior0 of Algorithm [T] 
for the search of up to the 6-sets of biological component local states. One can 
observe that the execution time grows very rapidly with N compared to the 
number of visited nodes. This can be explained by intermediate nodes having a 
large set of cut A^-sets leading to a costly computation of products. 

While the precise biological interpretation of identified A^-sets is out of the 
scope of this paper, we remark that the order of magnitude of the number of 
cut sets can be very different (more than 1000 cut 6-sets for SNAIL; none cut 

* Implemented as part of the Pint software -'http : //process .hitting .fr ee . f r | 
Models and scripts available at http: //loicpauleve .name/ cutsets . tbz2, 



6-sets for p21CIPl, except the gene that produces this protein). It supports a 
notion of robustness for the reachabihty of components, where the less cut sets, 
the more robust the reachabihty to various perturbations. 

AppUed to the whole_PID_AND model, our algorithm find in general much 
more cut A''-sets, due to the conjunctive interpretation. This brings a significant 
increase in the execution time: the search up to the cut 5-sets took Ih, and the 
6-sets leads to an out-of-memory failure. 

Because of the very large number of components involved in this model, the 
naive exact algorithm consisting in enumerating all possible A'^-sets candidates 
and verifying the concerned reachability using mo del- checking is not tractable. 
Similarly, making such a model fit into other frameworks, such as CellNet Anal- 
yser (see previous sub-section) is a challenging task, and might be considered as 
future work. 

5 Discussion 

We presented a new method to efficiently compute cut sets for the reachability of 
a local state of a component within networks of finite automata from any state 
delimited by a provided so-called context. Those cut sets arc sets of automata 
local states such that disabling the activity of all local states of a cut set guaran- 
tees to prevent the reachability of the concerned local state. Automata networks 
are commonly used to represent the qualitative dynamics of interacting biologi- 
cal systems, such as signalling networks. The computation of cut sets can then 
lead to propose potential therapeutic targets that have been formally identified 
from the model for preventing the activation of a particular molecule. 

The proposed algorithm works by propagating and combining the cut sets of 
local states along a Graph of Local Causality (GLC), that we introduce here upon 
automata networks. A GLC relates the local states that are necessary to occur 
prior to the reachability of the concerned local state. Several constructions of a 
GLC are generally possible and depend on the semantics of the model. We gave 
an example of such a construction for automata networks. That GLC has a size 
polynomial in the total number of local states in the network, and exponential in 
the number of local states within one automaton. Note that the core algorithm 
for computing the cut sets only requires as input a GLC satisfying a soundness 
property that can be easily extended to discrete systems that are more general 
than the automata networks considered here. 

The computed cut sets form an under-approximation of the complete cut sets 
as the GLC abstracts several dynamical constraints from the underlying concrete 
model. Our algorithm does not rely on a costly enumeration of the potential sets 
of candidates, and thus aim at being tractable on very large networks. 

A prototype implementation of our algorithm has been successfully applied 
to the extraction of cut sets from a Boolean model of a biological system in- 
volving more than 9000 interacting components. To our knowledge this is the 
first attempt of such a dynamical analysis for such large biological models. We 
note that most of the computation time is due to products between large sets 



of cut A^-sets. To partially address this issue, we use of prefix trees to represent 
set of sets on which we have specialized operations to stick to sets of iV-sets 
(Appendix \^ . There is still room for improvement as our prototype does not 
implement any caching or variable re-ordering. 

The work presented in this paper can be extended in several ways, notably 
with a posterior enlarging of the cut sets. Because the algorithm computes the 
cut A^-sets for each node in the GLC, it is possible to construct a posteriori cut 
sets with a greater cardinality by chaining them. For instance, let kps £ V(ai) 
be a cut A^-set for the reachability of a^, for each bj £ kps and hps' G V(6j), 
{kps \ {bj}) U kps' is a cut set for a^. In our biological case study, this method 
could be recursively applied until cut sets are composed of states of automata 
only acting for the environmental input. 

With respect to the defined computation of cut iV-sets, one could also derive 
static reductions of the GLC. Indeed, some particular nodes and arcs of the 
GLC can be removed without affecting the final valuation of nodes. A simple 
example are nodes representing objectives having no solution: such nodes can be 
safely removed as they bring no candidate A^-sets for parents processes. These 
reductions conduct to both speed-up of the proposed algorithm but also to more 
compact representations for the reachability causality. 

Finally, future work is considering the identification of necessary transitions 
(reactions), in addition to local states, that are necessary for a reachability to 
occur. The introduction of additional dynamical constraints in the GLC, such 
as confiicts or time scales, would also help to increase the number of cut sets 
identifiable by such abstract interpretation techniques. 
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A Implementation of Sets of Minimal N-Sets 



This appendix gives some details on the data structure we developed to ef- 
ficiently manipulate sets of minimal iV-sets, i.e., sets of A^-sets containing no 
sursets. The data structure is similar to prefix trees, on which operations have 
been design to perform union, product and simplification (minimisation) of 
sets of A^-sets. An OCamjf] implementation of these routines is available at 
http : //code . google . com/p/pint/source/browse/pintlib/kSets .ml 

Given a (possibly infinite) set of totally ordered elements, such as integers, 
the data structure is a forest where leafs are either _L or T, and intermediate 
nodes are elements. Each path from any root to any leaf form a strictly increasing 
sequence of elements. The maximum height of the forest is iV + 1. Fig. [2] gives 
an example of such a structure. 
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Fig. 2. Representation of set of sets {{1, 2, 4}, {1, 3}, {2, 5}} 



Data structure An instance of the data structure is either T, acting for the set 
containing the empty set ({0}), or ±, acting for the empty set, or an (ordered) 
associative map from (prefix) elements to other instances of the data structure. 
This is summarised by the following definition: 

V::=T \ ±\ {n^Vu--- ,ik^Vk} (1) 

with fe > 1 and ii < ■ ■ ■ < ik- Given V — {ii H> Pi, . . . , ifc n- Vk}, ii, . . . ,ik 
are the prefixes, and Di , . . . , T>k their corresponding sujfixes. We also note Vj G 

[f ; fc], = Dj. As mentioned, _L acts as the empty set, so any prefix mapped 

to an empty set can be removed: 

{ii ^ Pi, L, ij ^ ±, i?} = {ii ^ Vi,L, R} 

Hereafter, we assume that this removing is done implicitly. 
^ |http: //caml . inria.f r| 



Example 3. The set of sets {{1, 2, 4}, {1, 3}, {2, 5}} is encoded as 

{1 {2 {4 T}, 3 T}, 2 {5 T}} 

which corresponds to the forest in Fig. [51 

We first describe two helper functions on top of V that will be used for the 
operations. 

mvtslV) Given a data structure 2?, the inds function returns the sequence of 
prefix elements in the reverse order. If T) is either T or ±, the empty sequence 
is returned. 

VPfVjh) Given a data structure V and a level h (initially 1), UP removes all 
the sets in T) that can not be extended with one additional elements, i.e., all the 
sets with at least N elements. 



function inds(I?) 

if 7? € {T, _L} then return [] 

else if = {ii I— ^ • • • ,ik 2?fc} then 
return [z^, . . . 

end if 
end function 



function UP(D, h) 

if 2? G {T, _L} then return V 
else if h > N then return _L 
else 

for i INDS(I') do 

V{i) ^ VP{V{i),h+l) 
end for 
return "D 
end if 
end function 



Union and product The UNION and PRODUCT (realizing the x operator de- 
scribed in Sect.[T]) operations guarantee the ordering between prefixes, and that 
no set has cardinality strictly greater than N. 

UNION (Vaj'Db) Given two set of sets, this function merges the prefixes of sets. 

PRODUCT fDa, T^b, h ) Given two set of sets at level h, the product is computed as 
follows: if two sets share the same prefix, the product results from the product 
of suffixes; if two sets have different prefixes, the one having the highest prefix is 
augmented by one level (if possible) , and its product is computed with the suffix 
of the other. The result is the suf&x of the lowest prefix. 



function union (©a, 2?;,) 

if T>a = T or I>6 = T then return T 
else if "Da = J- then return Di, 
else if ©6 = _L then return Da 
else 

for i <— iNDs('Da) do 
if i € INDS(D6) then 

^ VNiON{Vaii),Vb{i)) 

else 

D(i) ^Da(i) 

end if 
end for 
return D 
end if 
end function 



function product (Da, h) 
if Da = T then return Dj 
else if Dj, = T then return Da 
else 

D ^ ± 

for i ^ iNDs(Da) do 
for j iNDs(D6) do 
it i = j then 

D,j PRODUCT(Da(i),D6(j), /l + 1) 

else if i > ji then 

"Pa/i -S- {i 1-^ UP(Da(i), h+1) 

Vij <«- PRODUCT(Da/i,D6(j), /l + 1) 

else it i < j then 

■Db/j ^ {j ^vp{Vt{j),h+l) 

Vij PRODUCT(Da(i),D5/j-,/l+ 1) 

end if 

D -s- UNION (D,Di3) 
end for 
end for 
return D 
end if 
end function 



Sursets simplification (minimisation) The above operations do not guar- 
antee that the resulting set of A^-sets is minimal, i.e., no sursets are present. 
Thanks to the ordering of elements in the forest, removing sursets of a given set 
can be done efficiently by only checking the sets having a lower prefix. 

REMOVE (V p, V, h) Given a set of sets Vp, this function removes all the sets in 
V that are sursets of sets in Vp, starting at level h. If I?p is T, D becomes the 
empty set; otherwise the process is repeated on all the suffixes having a prefix 
lower than the prefix of each set to remove. 

SIMPLIFY fT), h) At level h (initially 1), ranging the prefixes from the higher to the 
lower, each prefixed set is recursively simplified, then any sursets of previously 
computed sets are removed from it. The outputted set of A''-sets is hence minimal. 



function remove(2?p, ©, h) 
if = T then return ± 
else if V £ {T, _L} then return © 
else 

for j ^ lNDs(©p) do 
for i iNDs(r') do 
it i = j then 

V{i) REMOVE(I'p(j), h+1) 
else it i < j then 

V{i) -s- REMOVE(Dp/j, D(z), h + 1) 
end if 
end for 
end for 
return T) 
end if 
end function 



function simplify(D, h) 

if 2? € {T, _L} then return V 
else 

X>' •(- ± 

for i <— iNDs(r') do 

V/i ^ {j M> SIMPLIFY(2?(i), h + 1)} 
D/j ^ REMOVE(D',I'/i,/l) 
V UNI0N(O',©/i) 

end for 
return T>' 
end if 
end function 



